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We introduce a new protocol for a lossy data compression algorithm which is based on constraint 
satisfaction gates. We show that the theoretical capacity of algorithms built from standard parity- 
check gates converges exponentially fast to the Shannon's bound when the number of variables seen 
by each gate increases. We then generalize this approach by introducing random gates. They have 
theoretical performances nearly as good as parity checks, but they offer the great advantage that 
the encoding can be done in linear time using the Survey Inspired Decimation algorithm, a powerful 
algorithm for constraint satisfaction problems derived from statistical physics. 
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Introduction. Constraint satisfaction problems (CSPs) 
are at the heart of an emerging field of research which 
is of interest to statistical physics, combinatorial opti- 
mization, statistical inference and information theory 0. 
Broadly speaking, these are problems involving a large 
number of variables, taking values in a finite set (here- 
after we shall keep to binary variables). Each constraint 
involves K variables, and imposes a probability law on 
the 2 K possible assignments of the variables in this sub- 
set. Hard constraints just forbid some of the configura- 
tions. The spin glass problem Q, the satisfiability prob- 
lem which lies at the heart of the theory of computational 
complexity in computer science 0], or the parity check 
problems used in error correcting codes all belong to 
this category. 

A lot of progress has been made in recent years in 
the study of random constraint satisfaction problems 
where each constraint involves randomly chosen (with 
uniform distribution) variables [f|- This is the natu- 
ral setting for spin glasses, it offers the possibility to 
study issues in typical case complexity in satisfiability, 
and it provides some of the most efficient codes for er- 
ror correction. In several cases, it has been found that 
when the density of constraints increases the system en- 
ters first a 'clustered' phase before it reaches the thresh- 
old of unsatisfiability where it cannot meet all the con- 
straints. Above this threshold the configurations which 
violate the smaller number of constraints are also clus- 
tered. Clustering means that the configurations which 
satisfy all the constraints are grouped into many dis- 
connected clusters which are distant from each other. 
Statistical physics methods originating from spin glass 
theory, like the replica and cavity methods, turn out to 
be very efficient to study these phenomena 0, Q, and 
some of the results have been confirmed rigorously re- 
cently 0, 0, 0, EI • They have also led to a powerful al- 
gorithm (survey propagation) which is able to solve very 
large problems in the clustered region 0j. 

We will show how one can take advantage of these 



clustered phases to address a classic problem in coding 
theory, lossy data compression. While a large amount 
of work has been done in this field a number of 

challenging problems are open, among them the realiza- 
tion of a practical compression protocol for correlated 
sources or the exponential increasing time in the encod- 
ing/decoding step of typical algorithms. As for lossy 
compression schemes, it is worth to mention in partic- 
ular t he g ood performance of algorithms based on the 
codes developed in the context of channel coding. 
Here we propose an alternative strategy, and as a start- 
ing point we focus on the case of uncorrelated sources. 

The problem of lossy data compression can be summa- 
rized as follows. We have a source alphabet A, a source 
distribution p(x a ) and a distortion measure c?(-, •) which 
takes values in [0,1]. We start from an original mes- 
sage which is a sequence {x a } of M values independently 
drawn from the source distribution. The purpose of data 
compression is to map this message to a string of N bits, 
with N < M, in such a way that they can be for example 
easily transmitted or stored. Then, one wants to decode 
this A-bits string in order to reconstruct a sequence as 
close as possible to the original message. We call the 
decoded message {x*} and we want to minimize the ex- 
pected value of the distortion D = E Y^m=i d(x a , x*)/M. 
How small a distortion one can achieve depends on the 
rate R = N/M at which the original message has been 
compressed. 

The rate distortion theorem proved by Shannon flij 
provides a bound for the minimum rate at which a com- 
pression is possible once we fix the average distortion we 
tolerate. The analytic expression of this rate- distortion 
function R(D) is not known explicitly in the most general 
case of correlated (memory) sources, and it is most of- 
ten obtained by means of some numerical algorithm (see 
e.g. fl5j]l. On the other hand, for an uncorrelated unbi- 
ased binary source (i.e. with p(x a = 0) = 1 — p(x a 
1 = 1/2), the rate-distortion function in the large N 
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y = { yi yi yz y± ys } 

FIG. 1: A Tanner graph is a convenient representation for a 
CSP. We emphasize in this cartoon the topological support of 
our protocol. 

limit has the simple expression 

R{D) = l + D\og 2 D+(l-D)\og 2 (l-D) . (1) 

As a first requirement, a good lossy data compression al- 
gorithm must be able to approach this theoretical limit. 
One should mention that in the lossless case, that is the 
D — > limit, practical algorithms that saturate asymp- 
totically the Shannon's bound have been discovered a 
long time ago 0. The lossy case turns out to be more 
difficult from the algorithmic point of view. Recently, a 
perceptron with a non-linear transfer function has been 
proposed 01 as a lossy compressor and it has been an- 
alytically shown to achieve theoretically optimal perfor- 
mance, but its practical use is strongly reduced by the 
fact that there is no known polynomial algorithm in this 
case, and the typical string lengths that can be com- 
pressed in reasonable time are thus rather short. 

The general idea. We are interested here in developing 
an approach to the problem which is based on CSPs, as 
suggested in 0|. Our CSP uses M constraints between 
N Boolean variables taking values in {0,1}. Each con- 
straint, say a, is actually a gate controlled by the value 
x a of the a-th bit of the original message (see Fig. 
The gate a is connected to K a randomly chosen vari- 
ables. The 2 Ka possible configurations of these variables 
are partitioned into two equal size subsets, S a and U a . 
When the control bit is x a = 0, the configurations in 
S a satisfy the constraint, the configurations in U a don't. 
When the control bit is x a = 1, the configurations in U a 
satisfy the constraint, the configurations in S a don't. A 
simple example is provided by parity checks. S a consists 
of the configurations with an even number of 0's. The 
gate then performs a linear operation: it checks whether 
the sum of all variables and b a is equal to modulo 2. In 
this case the CSP is nothing but the well known XOR- 
SAT problem in computer science [l9j : it can also been 
seen as a spin glass problem with three-spin interactions. 

In our procedure, the initial word of M bits is used 
to build a CSP with M constraints for N (< M) vari- 
ables {yi, . . . j/jv}. We then look for a configuration of 



variables y* that minimizes the number of violated con- 
straints, that is ground state configuration. This is the 
encoded (compressed) word. Of course, this step is non- 
trivial since one must be able to handle a CSP which is in 
its "unsat" phase. We note that the rate R of the process 
is simply related to the density of constraints a = M/N 
by R = l/a. Once we have a ground state configuration, 
the decoding is easy: for each constraint a, one considers 
the configuration of the subset of K a variables appearing 
in a. If it lies in S a , the reconstructed bit is x* = 0, 
otherwise it is x* = 1. The number of bits of the origi- 
nal message which are wrongly reconstructed is nothing 
but the number of constraints violated in a ground state 
configuration. 

We shall measure the distortion as D = Yl a =i \ Xa ~ 
x*\/M. We define the total "energy" of a configuration 
y of the CSP as E(y) = J2 a =i £ ai where £ a (y) = if the 
constraint a is satisfied by the global configuration y, 
and £ a (y) = 1 otherwise. The distortion is then related 
to the ground state energy Eq of the CSP through: 

D = E/M . (2) 

We are interested in the thermodynamic limit N, M — > oo 
at fixed density of constraints a. Shannon's theorem pro- 
vides a lower bound Esh{ot) to the ground state energy, 
and a good compression algorithm should be based on 
a CSP with a very low ground state energy, as near as 
possible to Shannon's value. 

The coder based on parity check gates (XORSAT- 
CSP) is a good candidate. A general strategy for com- 
puting the ground state energy of this problem has been 
developed in ||. When all checks involve K variables 
and K becomes large, a computation based on this strat- 
egy shows that E a (a) — Esh{a) decreases exponen- 
tially with K . So the theoretical capacity of these gates 
rapidly approach Shannon's limit when K increases. Un- 
fortunately there is no known algorithm which matches 
this theoretical capacity. This is in contrast to the use of 
low density parity check codes for channel coding, where 
message passing techniques are known to perform quite 
well. However we shall see below that message passing 
does perform well on some other classes of gates. 

Message passing Useful techniques for solving random 
CSPs are based on local iterative updates of some "mes- 
sages" sent along the graph. For example, applying the 
'Min-Sum' algorithm |2jJ to CSPs, one obtains the Warn- 
ing Propagation (WP) algorithm: each constraint a sends 
a warning message u a ^ to one neighbor variable i ac- 
cording to the values of the other variables attached to 
it: This message can be - meaning that the variable is 
free to assume any value |23j - or 1 - meaning that, in or- 
der to satisfy that constraint, the variable should assume 
a certain value -. This algorithm is very powerful and ef- 
ficient in many CSPs where the underlying factor graph is 
locally tree-like, when the density of constraints is small 
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enough. However it is limited in two aspects: 1) it stops 
to converge when the density of constraint is such that 
the system is in a clustered phase, and in particular in 
the unsat regime which is of interest for our compression 
scheme. 2) It does not work for parity checks because of 
the basic symmetry of these gates. 

The first limitation can be handled by going to a more 
sophisticated message passing algorithm, Survey Propa- 
gation (SP). The SP algorithm is the direct implemen- 
tation of the 1RSB cavity equations on a single sample. 
In this case, one works with the probability distribution 
Q a _ > j(u a _ > j) that, if we pick one cluster at random, the 
warning u a _>i is sent along the link from constraint a to 
variable i. This is the general object which is needed in 
order to cope with the appearance of disconnected clus- 
ters of solutions. Once we know the probabilities of all 
the warnings, we estimate the probability of the total 
bias Hi on the variable i , defined as Hi = J2 a ev{i) u a^i i 
where V(i) is the set of constraints attached to the vari- 
able i. The idea behind the Survey Inspired Decima- 
tion (SID, 0) is to take advantage of this information 
to fix the most biased variable in the system. Once this 
is done, oen has a simplified CSP with N — 1 variables 
and we can thus repeat this step until one is left with an 
under-constrained problem solvable by some standard lo- 
cal algorithm. This algorithm has been shown to be very 
useful in CSP problems like the coloring or the K-SAT, 
where one is able to find efficiently a SAT assignment in 
the difficult region. It can also be used for problems with 
higher constraint density in order to find configurations 
of low energy (small number of violated constraints) |2^ . 
which makes it a very useful tool for compression. 

Non-linear nodes. While it performs well on many 
CSPs, the SP algorithm is useless for the "parity source 
coder" . The problem here comes from the fact that the 
distribution of the total bias is always symmetric, so that 
the messages obtained after convergence give no hint of 
how to decimate the problem. In order to solve this prob- 
lem, we propose a compression scheme based on some 
other gates, different from parity checks. These turn out 
to have a theoretical capacity close to the parity checks, 
and a generalized version of the SP algorithm |2CJ leads 
to a convergent decimation scheme which is an efficient 
coding algorithm. Among the several types of constraints 
we have examined, the "random" nodes have been found 
to be the more efficient. They are defined as follows: 
The subset S a is a randomly chosen subset of size 2 K ~ 1 . 
While parity checks just implement linear constraints on 
Z2, these random nodes are non- linear functions of their 
inputs. However from the point of view of the cavity 
method (used to compute their theoretical performance) 
and of the SP message passing procedure (used as en- 
coding algorithm), they can be handled by relatively 
straightforwa rd g eneralizations of the methods used for 
parity checks |2Cj . In this note we summarize the results. 

We build up a list of r random checks, and each con- 
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FIG. 2: The ground state energy of the CSP based on non- 
linear nodes versus the constraint density a. Shannon's bound 
is also plotted for comparison. 



straint a picks up one check randomly in this list |24j . 
This allows to memorize the truth table of all nodes and 
thus to speed up the algorithms. All the results quoted 
below are for r = 30. The theoretical capacity of this 
system, which is proportional to the ground state energy 
according to (0) , is illustrated in Fig. As we clearly 
see, the ground state quickly approaches the Shannon's 
bound of Eq. as K increases. Thus, this particular 
CSP is very promising from the point of view of lossy 
data compression. In Fig. |31 we show the phase diagram 
for K = 6. The static energy is the ground state energy 
per variable also plotted in Fig. [2 from the algorithmic 
point of view, this is the performanece of the best pos- 
sible algorithm which minimizes the number of violated 
constraints. The dynamical energy marks the appearance 
of a regime where solutions group in many different well 
separated clusters (that is, in order to go from one clus- 
ter to another one we should flip an extensive number of 
variables); any local algorithm, as for example WP, will 
be trapped at this dynamical threshold. The dynamical 
energy computed here in the 1RSB approximation is be- 
lieved to be an upper bound to the exact one. Finally, 
the stability curve [j| indicates the range of validity of 
the 1RSB formalism used to determine this phase dia- 
gram (in particular, the ground state energy which we 
compute should be the exact one for a < a>i rs b). 

So the theoretical properties of the random-node CSP 
are quite similar to theparity check CSP (the XORSAT 
problem discussed in |8|). The good point with respect 
to the parity check CSP is that in this case the SID algo- 
rithm does converge in the unsat phase in a time which 
scales as 2 K NlogN, and gives very low energy states, 
i.e. nearly optimal global configurations. In Fig. 0] we 
show the performance of the compressor based on ran- 
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FIG. 3: The phase diagram of a system with 30 different 
random nodes with K — 6. The values for the thresholds are: 
a d = 0.803, a s = 0.935 and a lrsb = 1.727. 
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FIG. 4: The performance of the algorithm is plotted versus 
the rate R = 1/a of the compression (here K — 6, N — 1000), 
together with the theoretical capacity and Shannon's bound. 



dom gates, for K = 6. The distortion achieved in practice 
by the algorithm with TV = 1000 is close to the theoret- 
ical capacity, which brings it a few % above Shannon's 
bound. As shown in Fig. [21 we expect the performance 
to improve with increasing K (at the price of an increase 
in computer time). 

Conclusions. We have shown, by using techniques bor- 
rowed from the statistical physics of disordered systems, 
how one can use CSPs as a tool for compressing data. In 
particular, the algorithmic performance of the random 
gates - CSPs based on non-linear nodes - is found to 
be nearly optimal, since the Shannon bound is reached 



at large K. The generalization of the present approach 
to compression of data from a larger alphabet (beyond 
binary input) looks like an interesting perspective. 
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